S.-T. Yau College Student Mathematics Contests 2022

Probability and Statistics

Solve every problem.

Problem 1. Let

{X_{n}}

be a sequence of Gaussian random variables. Suppose that

X

is a random variable such that

X_{n}

converges to

X

in distribution as

n \to \infty

. Show that

X

is also a (possibly degenerate, i.e., variance zero) Gaussian random variable.

Solution: Let

f_{n} (t) = E e^{i t X_{n}}

be the characteristic function of

X_{n}

and

f (t) = E e^{i t X}

be that of

X

. There are real numbers

μ_{n}

and

σ_{n}

such that

f_{n} (t) = e^{i μ_{n} t - σ_{n}^{2} t^{2} / 2}

. We have

{| f_{n} (t) |}^{2} \to | f (t) |^{2}

, hence

e^{- σ_{n}^{2} t^{2}} \to | f (t) |^{2}

for all

t \in R

. Since

f (t) \neq 0

t

is close to 0 , we must have

σ_{n}^{2} \to σ^{2}

for some

σ \in [0, \infty)

. Now we have

e^{i μ_{n} t} \to f (t) e^{σ^{2} t^{2}}

for all

t \in R

and by the dominated convergence theorem,

lim_{n \to \infty} \int_{0}^{t} e^{i μ_{n} s} d s = \int_{0}^{t} f (s) e^{σ^{2} s^{2} / 2} d s

The integral on the right side does not vanish if

t

is close, but not equal to, 0 because the integrand is countinuous and equal to 1 at

s = 0

. On the other hand,

i μ_{n} \int_{0}^{t} e^{i μ_{n} s} d s = e^{i μ_{n} t} - 1

This gives

μ_{n} = - i (f_{n} (t) e^{σ_{n}^{2} t^{2} / 2} - 1) {(\int_{0}^{t} e^{i μ_{n} s} d s)}^{- 1}

from which we see that that

μ_{n}

must converges to a finite number

μ

. Finally,

f_{n} (t) \to e^{i μ t - σ^{2} t^{2} / 2} = f (t)

and

X

must be a (possibly denegerate) Gaussian random variable.

Problem 2. For two probability measures

μ

and

ν

on the real line

R

, the total variation distance

∥ μ - ν ∥_{T V}

is defined as

∥ μ - ν ∥_{T V} = sup {μ (C) - ν (C) : C \in B (R)}

where

B (R)

is the

σ

-algebra of Borel sets on

R

. Let

C (μ, ν)

be the space of couplings of the probability measures

μ

and

ν

, i.e., the space of

R^{2}

valued random variables

(X, Y)

defined on some (not necessarily same) probability space

(Ω, F, P)

such that the marginal distributions of

X

and

Y

are

μ

and

ν

, respectively. Show that

∥ μ - ν ∥_{T V} = inf {P (X \neq Y) : (X, Y) \in C (μ, ν)}

For simplicity you may assume that

μ

and

ν

are absolutely continuous with respect to the Lebesgue measure on R.

Solution: (1) Let

C \in B (R)

and

(X, Y) \in C (μ, ν)

. Then

μ (C) - ν (C) = P {X \in C} - P {Y \in C} \leq P {X \in C, Y \notin C} \leq P {X \neq Y}

Taking the supremum over

C \in B (R)

and then the infimum over

(X, Y) \in C (μ, ν)

we obtain

∥ μ - ν ∥_{T V} \leq inf {P {(X \neq Y} : (X, Y) \in C (μ, ν)}

(2) It is sufficient to a probability measure

P \in C (μ, ν)

and a set

C \in B (R)

such that for

(X, Y) \in R^{2}

under this probability,

μ (C) - v (C) = P {X \neq Y}

The idea is to construct

P

such that the probability

P {X = Y}

is the largest possible under the condition that

(X, Y) \in C (μ, ν)

. Let

m = μ + ν

, or just take

m

to be the Lebesgue measure if

μ

and

ν

are absolutely continuous with respect to

m

. We have

μ = f_{1} \cdot m

and

ν = f_{2} \cdot m

by the Radon-Nikodym theorem. Let

f = min {f_{1}, f_{2}} = f_{1} \land f_{2}

. Define a probability measure

P

R^{2}

P {(X, Y) \in A \times B} = \frac{1}{1 - a} \int_{A \times B} (f_{1} (x) - f (x)) (f_{2} (y) - f (y)) m (d x) m (d y) + \int_{A \cap B} f (z) m (d z)

Here

a = \int_{R} f (z) m (d z)

and we assume that

a < 1

; otherwise

a = 1

and

f_{1} = f_{2}

, and the case is trivial. Note that the first part is the product measure of

(f_{1} - f) \cdot m

and

(f_{2} - f) \cdot m

) (up to a constant) and the second part is the probability measure

f \cdot m

on the diagonal (identified with

R

) of

R^{2}

. We have

P {X \in A} = \int_{A} (f_{1} (x) - f (x)) m (d x) + \int_{A} f (z) m (d z) = \int_{A} f_{1} (x) m (d x) = μ (A) .

Similarly

P {Y \in B} = ν (B)

, hence

(X, Y) \in C (μ, ν)

. On the other hand,

P {X \neq Y} = \int_{R} (f_{1} (x) - f (x)) m (d x) = 1 - a

If we choose

C = {f_{1} > f_{2}}

, then

μ (C) - v (C) = \int_{C} (f_{1} (x) - f_{2} (x)) m (d x) = \int_{R} (f_{1} (x) - f (x)) m (d x) = 1 - a

This shows that

μ (C) - ν (C) = P {X \neq Y}

Problem 3. We throw a fair die repeatedly and independently. Let

τ_{11}

be the first time the pattern 11 (two consecutive 1 's) appears and

τ_{12}

the first time the pattern 12 ( 1 followed by 2 ) appears.

(a) Calculate the expected value

E τ_{11}

(b) Which is larger,

E τ_{11}

E τ_{12}

? It is sufficient to give an intuitive argument to justify your answer. You can also calculate

E τ_{12}

if you wish.

Solution:

(a) Let

τ_{1}

be the first time the digit 1 appears. At this time, if the next result is 1 , then

τ_{11} = τ_{1} + 1

; if the next result is not 1 , then the time is

τ_{1} + 1

and we have to start all over again. This means

E τ_{11} = \frac{1}{6} \cdot {E τ_{1} + 1} + \frac{5}{6} \cdot {E τ_{1} + 1 + E τ_{11}}

Solving for

E τ_{11}

we have

E τ_{11} = 6 (E τ_{1} + 1)

. We need to calculate

E τ_{1}

. The set

{τ_{1} \geq n}

is the event that that none of the first

n - 1

results is 1 , hence

\mp {τ_{1} \geq n} = (5 / 6)^{n - 1}

and

E τ_{1} = \sum_{n = 1}^{\infty} \mp {τ_{1} \geq n} = \sum_{n = 1}^{\infty} {(\frac{5}{6})}^{n - 1} = 6

It follows that

E τ_{11} = 6 (6 + 1) = 42

(b) For either 11 or 12 to occur, we have to wait until the first 1 occurs. After that, if we want 11, the next digit needs to be 1 ; otherwise we have to start all over again (i.e., waiting for the next 1 ). But if we want 12 , the next digit needs to be 2 ; otherwise, we have to start all over again only if the next digit is 3 to 6 because if the next digit is 1 , we have already have a start on the pattern 12. It follows that the pattern 12 has a slight advantage to occur earlier than 11. Thus we have

E τ_{12} \leq E τ_{11}

We can also calculate

E τ_{12}

directly. Let

τ_{1}

be as before and let

σ

be the first time a digit not equal to 1 appears. After

τ_{1}

we wait until the first time a digit not equal to 1 appears. With probability

1 / 5

this digit is 2 ; with probability

4 / 5

this probability is not 2 , then we have to start over again. This means that

E τ_{12} = \frac{1}{5} \cdot {E (τ_{1} + σ)} + \frac{4}{5} \cdot {E (τ_{1} + σ) + E τ_{12}}

Hence

E τ_{12} = 5 E (τ_{1} + σ)

. We have seen

E τ_{1} = 6

. On the other hand,

{σ \geq n}

is the event that the first

n - 1

digits are 1 , hence

\mp {σ \geq n} = (1 / 6)^{n - 1}

and

E σ = 6 / 5

. It follows that

E τ_{12} = 5 (6 + \frac{6}{5}) = 36

Problem 4. Let

{X_{n}}

be a Markov chain on a discrete state space

S

with transition function

p (x, y), x, y \in S

. Suppose that there is a state

y_{0} \in S

and a positive number

θ

such that

p (x, y_{0}) \geq θ

for all

x \in S

(a) Show that is a positive constant

λ < 1

such that for any two initial distribution

μ

and

ν

\sum_{y \in S} | P_{μ} {X_{1} = y} - P_{ν} {X_{1} = y} | \leq λ \sum_{y \in S} | μ (y) - ν (y) |

(b) Show that the Markov chain has a unique stationary distribution

π

and

\sum_{y \in S} | P_{μ} {X_{n} = y} - π (y) | \leq 2 λ^{n}

Solution:

(a) Let

θ = min {p (x, y_{0}) : x \in S}

. Then

0 < θ \leq 1

. For any two probability meausres

μ

and

ν

on the state space

S

, we have

\sum_{y \in S} | P_{μ} {X_{1} = y} - P_{ν} {X_{1} = y} | = \sum_{y \in S} | \sum_{x \in S} {μ (x) - ν (x)} p (x, y) |

For the term

y = y_{0}

we can replace

p (x, y_{0})

p (x, y_{0}) - θ

because

\sum_{x \in S} {μ (x) - ν (x)} = 1 - 1 = 0

. After this replacement, we take the absolute value of every term and exchange the order of summation. Using the fact that

p (x, y_{0}) - θ \geq 0

we have

\sum_{y \in S} | P_{μ} {X_{1} = y} - P_{ν} {X_{1} = y} | \leq [\sum_{y \in S} p (x, y) - θ] \cdot \sum_{x \in S} | μ (x) - ν (x) |

The first sum on the right side is

1 - θ = λ < 1

. It follows that

\sum_{y \in S} | P_{μ} {X_{1} = y} - P_{ν} {X_{1} = y} | \leq λ \sum_{x \in S} | μ (x) - ν (x) |

(b) Let

μ_{n} (x) = P_{μ} {X_{n} = x}

. Then

μ_{n + 1} = P_{μ_{n}} {X_{1} = x}

and

μ_{n} = P_{μ_{n - 1}} {X_{1} = x}

. By (a),

\sum_{x \in S} | μ_{n + 1} (x) - μ_{n} (x) | \leq λ \sum_{x \in S} | μ_{n} (x) - μ_{n - 1} (x) |

It follows that

\sum_{x \in S} | μ_{n + 1} (x) - μ_{n} (x) | \leq λ^{n} \sum_{x \in S} | μ_{1} (x) - μ (x) | \leq 2 λ^{n}

Since

0 \leq λ < 1

, the distributions

μ_{n}

converges to a distribution

π

, which is obviously stationary. We have by the same argument,

\sum_{y \in S} | P_{μ} {X_{n} = y} - π (y) | = \sum_{y \in S} | P_{μ} {X_{n} = y} - P_{π} {X_{n} = y} | \leq 2 λ^{n}

σ

is another stationary distribution, then

\sum_{y \in S} | σ (y) - π (y) | = \sum_{y \in S} | P_{σ} {X_{n} = y} - P_{π} {X_{n} = y} | \leq 2 λ^{n} ⟶ 0

Hence a stationary distribtuion of the Markov chain must be unique.

Problem 5. Consider a linear regression model with

p

predictors and

n

observations:

Y = X β + e

where

X_{n \times p}

is the design matrix,

β

is the unknown coefficient vector, and the random error vector e has a multivariate normal distribution with mean zero and

Var (e) = σ^{2} I_{n} (σ^{2} > 0

unknown and

I_{n}

is the identity matrix). Here

rank (X) = k \leq p, p

may or may not be greater than

n

, but we assume

n - k > 1

. Let

x_{1} = (x_{1, 1}, \dots, x_{1, p})

be the first row of

X

and define

γ = \frac{x_{1} β}{σ}

Find the uniformly minimum variance unbiased estimator (UMVUE) of

γ

or prove it does not exist.

Solution: The key points in the solution are the following.

(i) Any least squares estimator, say

\hat{β}

, of

β

is independent of

{\hat{σ}}^{2} = ∥ Y - X \hat{β} ∥^{2} / (n - k)

(ii)

x_{1} β

is clearly estimable.

(iii) Based on (i) and (ii), we can constructor an unbiased estimator, say

\hat{γ}

, of

γ

in terms of

\hat{β}

and

{\hat{σ}}^{2}

, and consequently we know the estimator is a function of

X^{T} Y

and

∥ Y - X \hat{β} ∥^{2}

(iv) In fact,

(X^{T} Y, ∥ Y - X \hat{β} ∥^{2})

is a complete and sufficient statistic and we conclude

\hat{γ}

is the UMVUE of

γ

. More details are given below.

Let

\hat{β} = {(X^{T} X)}^{-} X^{T} Y

be a least squares estimator of

β

, where

{(X^{T} X)}^{-}

denotes any generalized inverse of

X^{T} X

. Let

θ = x_{1} β

, which is clearly estimable. By Gauss-Markov Theorem, we know

\hat{θ} =: x_{1} \hat{β}

is the best linear unbiased estimator of

θ

. For the unbiased estimator

{\hat{σ}}^{2} = ∥ Y - \hat{Y} ∥^{2} / (n - k)

, we know

(n - k) {\hat{σ}}^{2} / σ^{2}

has

χ_{n - k}^{2}

distribution, which belongs to the Gamma family. Thus, it is readily seen that

E (1 / \hat{σ}) = C / σ

, where

C

is a known constant

(C = \sqrt{n - k} Γ (\frac{n - k - 1}{2}) / (\sqrt{2} Γ (\frac{n - k}{2}))

Let

\hat{γ} = \hat{θ} / (C \hat{σ})

. Let

H = X {(X^{T} X)}^{-} X^{T}

denote the projection matrix. Clearly,

(I_{n} - H) X = 0

, which implies

Cov ({(X^{T} X)}^{-} X^{T} Y, (I_{n} -

H) Y) = 0

. Together with the Gaussian error assumption, we know

{(X^{T} X)}^{-} X^{T} Y

and

(I_{n} - H) Y

are independent. It follows that

\hat{β}

(any choice) and

{\hat{σ}}^{2}

are independent. This leads to the unbiasedness of

\hat{γ}

With elementary simplifications, based on basic exponential family properties, we see that

T = (X^{T} Y, ∥ Y - \hat{Y} ∥^{2})

is a complete and sufficient statistic. We conclude that

\hat{γ}

is indeed unbiased and a function of a complete and sufficient statistic, and hence it must be the UMVUE of

γ

Problem 6. Let

X_{1}, \dots, X_{2022}

be independent random variables with

X_{i} \sim N (θ_{i}, i^{2}), 1 \leq i \leq 2022

. For estimating the unknown mean vector

θ \in R^{2022}

, consider the loss function

L (θ, d) = \sum_{i = 1}^{2022} {(d_{i} - θ_{i})}^{2} / i^{2}

. Prove that

X =

(X_{1}, \dots, X_{2022})

is a minimax estimator of

θ

Recall: If

Y ∣ μ \sim N (μ, σ^{2})

and

μ \sim N (μ_{0}, σ_{0}^{2})

then

μ | Y = y \sim N (\frac{μ_{0} / σ_{0}^{2} + y / σ^{2}}{1 / σ_{0}^{2} + 1 / σ^{2}}, \frac{1}{1 / σ_{0}^{2} + 1 / σ^{2}})

Solution: We show

X

, as an equalizer (constant risk), achieves the limit of Bayes risks under certain priors. First, consider independent priors

θ_{i} \sim N (0, τ^{2}), 1 \leq i \leq 2022

. Then, the Bayes estimator

δ_{τ}

has the

i

-th component (estimator of

θ_{i}

) being the posterior mean

E_{τ} (θ_{i} ∣ X) = \frac{X_{i} / i^{2}}{1 / τ^{2} + 1 / i^{2}}

. The associated Bayes risk is

R_{τ} (δ_{τ}) = \sum_{i = 1}^{2022} i^{- 2} \frac{1}{1 / τ^{2} + 1 / i^{2}}

. Clearly, as

τ \to \infty

R_{τ} (δ_{τ}) \to \sum_{i = 1}^{2022} 1 = 2022

, which is identical to the Bayes risk of

X

. This implies that

N (0, τ^{2})

with

τ \to \infty

gives a least favorable sequence of priors and

X

is minimax.